Operadores de Seleção por Similaridade para Sistemas de Gerenciamento de Bases de Dados Relacionais
نویسندگان
چکیده
Searching operations in complex datasets are performed using comparison criteria based on similarity because equality comparison are barely useful and those based on the ordering relationships cannot be applied due to the nature of these datasets. There are two basic operators for similarity queries: Range Query and k-Nearest Neighbor Query. A great amount of research was done to achieve effective algorithms for those operators. However, algorithms that deal with these operators as parts of a more complex operation (compositions of them) were not developed yet. This article presents two new algorithms, named kAndRange and kOrRange, which are designed to answer conjunctions and disjunctions operations between those similarity criteria. The new algorithms were tested with sequential scan and with a metric access method called Slim-tree. The experimental results, performed with real and synthetic datasets, show that the new algorithms have better performance than the composition of the two operators to answer these complex similarity queries in all measured aspects, being up to 40 times faster. This is an essential point that will enable the practical use of similarity operators in Relational
منابع مشابه
A Tecnologia Objeto-Relacional em Ambientes de Data Warehouse: Uso de Séries de Tempo como Tipo de Dado Não Convencional
Este artigo discute a utilização da tecnologia objeto-relacional (OR) em ambientes de Data Warehouse (DW). Em especial, apresenta uma análise sobre a viabilidade do uso de séries temporais como tipo de dado não convencional em um DW. A dimensão tempo é fundamental em qualquer DW, uma vez que estes sistemas têm por objetivo armazenar dados históricos derivados de diversos sistemas heterogêneos, ...
متن کاملUma Abordagem para Armazenamento de Dados Semi-Estruturados em Bancos de Dados Relacionais
This paper presents an approach to storing semistructured data in relational databases. We focus on semistructured data as extracted from Web pages by a tool called DEByE (Data Extraction By Example), and organized according to its data model, the DEByE Object Model (DEByE-OM). The approach presented here consists in representing the structure of objects extracted by DEByE by a relational schem...
متن کاملUma Estratégia para Seleção de Atributos Relevantes no Processo de Resolução de Entidades
Data integration is an essential task for achieving a unified view of data stored in heterogeneous and distributed sources. A key step in this process is the Entity Resolution, which consists of identifying instances that refer to the same real-world entity. Functions that evaluate the similarity between values of attributes are used to identify equivalent instances. This work proposes a strate...
متن کاملAmbiente de gerenciamento de imagens e dados espaciais para desenvolvimento de aplicações em biodiversidade
There is a wide range of environmental applications requiring sophisticated management of several kinds of data, including spatial data and images of living beings. However, available information systems offer very limited support for managing such data in an integrated manner. This thesis provides a solution to combine these query requirements, which takes advantage of current digital library ...
متن کاملEstratégias de Seleção de Conteúdo com Base na CST (Cross-document Structure Theory) para Sumarização Automática Multidocumento
O presente trabalho apresenta a definição, formalização e avaliação de estratégias de seleção de conteúdo para sumarização automática multidocumento com base na teoria discursiva CST (Cross-document Structure Theory). A tarefa de seleção de conteúdo foi modelada por meio de operadores que representam possíveis preferências do usuário para a sumarização. Estes operadores são especificados em tem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003